Musical fingerprinting based on onset intervals

ABSTRACT

Methods, computing devices, and machine readable storage media for generating a fingerprint of a music sample. The music sample may be filtered into a plurality of frequency bands. Onsets in each of the frequency bands may be independently detected. Inter-onset intervals between pairs of onsets within the same frequency band may be determined. At least one code associated with each onset may be generated, each code comprising a frequency band identifier identifying a frequency band in which the associated onset occurred and one or more inter-onset intervals. Each code may be associated with a timestamp indicating when the associated onset occurred within the music sample. All generated codes and the associated timestamps may be combined to form the fingerprint.

NOTICE OF COPYRIGHTS AND TRADE DRESS

A portion of the disclosure of this patent document contains materialwhich is subject to copyright protection. This patent document may showand/or describe matter which is or may become trade dress of the owner.The copyright and trade dress owner has no objection to the facsimilereproduction by anyone of the patent disclosure as it appears in thePatent and Trademark Office patent files or records, but otherwisereserves all copyright and trade dress rights whatsoever.

BACKGROUND

1. Field

This disclosure relates to developing a fingerprint of an audio sampleand identifying the sample based on the fingerprint.

2. Description of the Related Art

The “fingerprinting” of large audio files is becoming a necessaryfeature for any large scale music understanding service or system.“Fingerprinting” is defined herein as converting an unknown musicsample, represented as a series of time-domain samples, to a match of aknown song, which may be represented by a song identification (ID). Thesong ID may be used to identify metadata (song title, artist, etc.) andone or more recorded tracks containing the identified song (which mayinclude tracks of different bit rate, compression type, file type,etc.). The term “song” refers to a musical performance as a whole, andthe term “track” refers to a specific embodiment of the song in adigital file. Note that, in the case where a specific musicalcomposition is recorded multiple times by the same or different artists,each recording is considered a different “song”. The term “music sample”refers to audio content presented as a set of digitized samples. A musicsample may be all or a portion of a track, or may be all or a portion ofa song recorded from a live performance or from an over-the-airbroadcast.

Examples of fingerprinting have been published by Haitsma and Kalker (Ahighly robust audio fingerprinting system with an efficient searchstrategy, Journal of New Music Research, 32(2):211-221, 2003), Wang (Anindustrial strength audio search algorithm, International Conference onMusic Information Retrieval (ISMIR)2003), and Ellis, Whitman, Jehan, andLamere (The Echo Nest musical fingerprint, International Conference onMusic Information Retrieval (ISMIR)2010).

Fingerprinting generally involves compressing a music sample to a code,which may be termed a “fingerprint”, and then using the code to identifythe music sample within a database or index of songs.

DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flow chart of a process for generating a fingerprint of amusic sample.

FIG. 2 is a flow chart of a process for adaptive onset detection.

FIG. 3 is a flow chart of another process for adaptive onset detection.

FIG. 4 is a graphical representation of a code.

FIG. 5 is a graphical representation of onset interval pairs.

FIG. 6 is a flow chart of a process for recognizing music based on afingerprint.

FIG. 7 is a graphical representation of an inverted index.

FIG. 8 is a block diagram of a system for fingerprinting music samples.

FIG. 9 is a block diagram of a computing device.

Elements in figures are assigned three-digit reference designators,wherein the most significant digit is the figure number where theelement was introduced. Elements not described in conjunction with afigure may be presumed to have the same form and function as apreviously described element having the same reference designator.

DETAILED DESCRIPTION

Description of Processes

FIG. 1 shows a flow chart of a process 100 for generating a fingerprintrepresenting the content of a music sample. The process 100 may begin at110, when the music sample is provided as a series of digitizedtime-domain samples, and may end at 190 after a fingerprint of the musicsample has been generated. The process 100 may provide a robust reliablefingerprint of the music sample based on the relative timing ofsuccessive onsets, or beat-like events, within the music sample. Incontrast, previous musical fingerprints typically relied upon spectralfeatures of the music sample in addition to, or instead of, temporalfeatures like onsets.

At 120, the music sample may be “whitened” to suppress strong stationaryresonances that may be present in the music sample. Such resonances maybe, for example, artifacts of the speaker, microphone, room acoustics,and other factors when the music sample is recorded from a liveperformance or from an over-the-air broadcast. “Whitening” is a processthat flattens the spectrum of a signal such that the signal more closelyresembles white noise (hence the name “whitening”).

At 120, the time-varying frequency spectrum of the music sample may beestimated. The music sample may then be filtered using a time-varyinginverse filter calculated from the frequency spectrum to flatten thespectrum of the music sample and thus moderate any strong resonances.For example, at 120, a linear predictive coding (LPC) filter may beestimated from the autocorrelation of one second blocks for the musicsample, using a decay constant of eight seconds. An inverse finiteimpulse response (FIR) filter may then be calculated from the LPCfilter. The music sample may then be filtered using the FIR filter. Eachstrong resonance in the music sample may be thus moderated by acorresponding zero in the FIR filter.

At 130, the whitened music sample may be partitioned into a plurality offrequency bands using a corresponding plurality of band-pass filters.Ideally, each band may have sufficient bandwidth to allow accuratemeasurement of the timing of the music signal (since temporal resolutionhas an inverse relationship with bandwidth). At the same time, theprobability that a band will be corrupted by environmental noise orchannel effects increases with bandwidth. Thus the number of bands andthe bandwidths of each band may be determined as a compromise betweentemporal resolution and a desire to obtain multiple uncorrupted views ofthe music sample.

For example, at 130, the music sample may be filtered using the lowesteight filters of the MPEG-Audio 32-band filter bank to provide eightfrequency bands spanning the frequency range from 0 to about 5500 Hertz.More or fewer than eight bands, spanning a narrower or wider frequencyrange, may be used. The output of the filtering will be referred toherein as “filtered music samples”, with the understanding that eachfiltered music sample is a series of time-domain samples representingthe magnitude of the music sample within the corresponding frequencyband.

At 140, onsets within each filtered music sample may be detected. An“onset” is the start of period of increased magnitude of the musicsample, such as the start of a musical note or percussion beat. Onsetsmay be detected using a detector for each frequency band. Each detectormay detect increases in the magnitude of the music sample within itsrespective frequency band. Each detector may detect onsets, for example,by comparing the magnitude of the corresponding filtered music samplewith a fixed or time-varying threshold derived from the current and pastmagnitude within the respective band.

At 150, a timestamp may be associated with each onset detected at 140.Each timestamp may indicate when the associated onset occurs within themusic sample, which is to say the time delay from the start of the musicsample until the occurrence of the associated onset. Since extremeprecision is not necessarily required for comparing music samples, eachtimestamp may be quantized in time intervals that reduce the amount ofmemory required to store timestamps within a fingerprint, but are stillreasonably small with respect to the anticipated minimum inter-onsetinterval. For example, the timestamps may be quantized in units of 23.2milliseconds, which is equivalent to 1024 sample intervals if the audiosample was digitized at a conventional rate of 44,100 samples persecond. In this case, assuming a maximum music sample length of about 47seconds, each time stamp may be expressed as an eleven-bit binarynumber.

The fingerprint being generated by the process 100 is based on therelative location of onsets within the music sample. The fingerprint maysubsequently be used to search a music library database containing aplurality of similarly-generated fingerprints of known songs. Since themusic sample will be compared to the known songs based on the relative,rather than absolute, timing of onsets, the length of a music sample mayexceed the presumed maximum sample length (such that the time stampsassigned at 150 “wrap around” and restart at zero) without significantlydegrading the accuracy of the comparison.

At 160, inter-onset intervals (IOIs) may be determined. Each IOI may bethe difference between the timestamps associated with two onsets withinthe same frequency band. IOIs may be calculated, for example, betweeneach onset and the first succeeding onset, between each onset and thesecond succeeding onset, or between other pairs of onsets.

IOIs may be quantized in time intervals that are reasonably small withrespect to the anticipated minimum inter-onset interval. Thequantization of the IOIs may be the same as the quantization of thetimestamps associated with each onset at 150. Alternatively, IOIs may bequantized in first time units and the timestamps may be quantized inlonger time units to reduce the number of bits required for eachtimestamp. For example, IOIs may be quantized in units of 23.2milliseconds, and the timestamps may be quantized in longer time unitssuch as 46.4 milliseconds or 92.8 milliseconds. Assuming an averageonset rate of about one onset per second, each inter-onset interval maybe expressed as a six or seven bit binary number.

At 170, one or more codes may be associated with some or all of theonsets detected at 140. Each code may include one or more IOIsindicating the time interval between the associated onset and asubsequent onset. Each code may also include a frequency band identifierindicating the frequency band in which the associated onset occurred.For example, when the music sample is filtered into eight frequencybands at 130 in the process 100, the frequency band identifier may be athree-bit binary number. Each code may be associated with the timestampassociated with the corresponding onset.

At 170, multiple codes may be associated with each onset. For example,two, three, six, or more codes may be associated with each onset. Eachcode associated with a given onset may be associated with the sametimestamp and may include the same frequency band identifier. Multiplecodes associated with the same onset may contain different IOIs orcombinations of IOIs. For example, three codes may be generated thatinclude the IOIs from the associated onset to each of the next threeonsets in the same frequency band, respectively.

At 180, the codes determined at 170 may be combined to form afingerprint of the music sample. The fingerprint may be a list of all ofthe codes generated at 170 and the associated timestamps. The codes maybe listed in timestamp order, in timestamp order by frequency band, orin some other order. The ordering of the codes may not be relevant tothe use of the fingerprint. The fingerprint may be stored and/ortransmitted over a network before the process 100 ends at 190.

Referring now to FIG. 2, a method of detecting onsets 200 may besuitable for use at 140 in the process 100 of FIG. 1. The method 200 maybe performed independently and concurrently for each of the plurality offiltered music samples from 130 in FIG. 1. At 210, a magnitude of afiltered music sample may be compared to an adaptive threshold 255. Inthis context, an “adaptive threshold” is a threshold that varies oradapts in response to one or more characteristics of the filtered musicsample. An onset may be detected at 210 each time the magnitude of thefiltered music sample rises above the adaptive threshold. To reducesusceptibility to noise in the original music sample, an onset may bedetected at 210 only when the magnitude of the filtered music samplerises above the adaptive threshold for a predetermined period of time.

At 230 the filtered music sample may be low-pass filtered to effectivelyprovide a recent average magnitude of the filtered music sample 235. At240, onset intervals determined at 160 based on onsets detected at 210may be low-pass filtered to effectively provide a recent averageinter-onset interval 245. At 250, the adaptive threshold may be adjustedin response to the recent average magnitude of the filtered music sample235 and/or the recent average inter-onset interval 245, and/or someother characteristic of, or derived from, the filtered music sample.

Referring now to FIG. 3, another method of detecting onsets 300 may besuitable for use at 140 in the process 100 of FIG. 1. The method 300 maybe performed independently and concurrently for each of the plurality offiltered music samples from 130 in FIG. 1. At 310, a magnitude of afiltered music sample may be compared to a decaying threshold 355, whichis to say a threshold that becomes progressively lower in value overtime. An onset may be detected at 310 each time the magnitude of thefiltered music sample rises above the decaying threshold 355. To reducesusceptibility to noise in the original music sample, an onset may bedetected at 310 only when the magnitude of the filtered music samplerises above the decaying threshold 350 for a predetermined period oftime.

When an onset is detected at 310, the decaying threshold 355 may bereset to a higher value. Functionally, the decaying threshold 355 may beconsidered to be reset in response to a reset signal 315 provided from310. The decaying threshold 355 may be reset to a value that adapts tothe magnitude of the filtered music sample. For example, the decayingthreshold 355 may be reset to a value higher, such as five percent orten percent higher, than a peak magnitude of the filtered music samplefollowing each onset detected at 310.

At 320, onset intervals determined at 160 from onsets detected at 310may be low-pass filtered to effectively provide a recent averageinter-onset interval 325. At 330, the recent average inter-onsetinterval 325 may be compared to a target value derived from a targetonset rate. For example, the recent average inter-onset interval 325 maybe inverted to determine a recent average onset rate that is compared toa target onset rate of one onset per second, two onsets per second, orsome other predetermined target onset rate. When a determination is madeat 330 that the recent average inter-onset interval 325 is too short(average onset rate higher than the predetermined target onset rate),the decay rate of the decaying threshold 355 may be reduced at 345.Reducing the decay rate will cause the decaying threshold value tochange more slowly, which may increase the intervals between successiveonset detections. When a determination is made at 330 that the recentaverage inter-onset interval 325 is too long (average onset rate smallerthan the predetermined target onset rate), the decay rate of thedecaying threshold 355 may be increased at 340. Increasing the decayrate will cause the decaying threshold value to change more quickly,which may decrease the intervals between successive onset detections.

The target onset rate may be determined as a compromise between theaccuracy with which a music sample can be matched to a song from a musiclibrary, and the computing resources required to store the music libraryand perform the matching. A higher target onset rate leads to moredetailed descriptions of each music sample and song, and thus providesmore accurate matching. However, a higher target onset rate results inslower, more computationally intensive matching process and aproportionally larger music library. A rate of about one onset persecond may be a good compromise.

Referring now to FIG. 4, a code 400, which may be a code generated at170 in the process 100 of FIG. 1, may include a frequency bandidentifier 402, a first IOI 404, and a second IOI 406. The code 400 maybe associated with a timestamp 408. The frequency band identifier 402may identify the frequency band in which an associated onset occurred.The first IOI 404 may indicate the time interval between the associatedonset and a selected subsequent onset, which may not necessarily be thenext onset within the same frequency band. The second IOI 406 mayindicate the time interval between a pair of onsets subsequent to theassociated onset within the same frequency band. The order of the fieldsin the code 400 is exemplary, and other arrangements of the fields arepossible.

The frequency band identifier 402, the first IOI 404, and the second IOI406 may contain a total of n binary bits, where n is a positive integer.n may typically be in the range of 13-18. For example, the code 400 mayinclude a 3-bit frequency band identifier and two 6-bit IOIs for a totalof fifteen bits. Not all of the possible values of the n bits may befound in any given music sample. For example, typical music samples mayhave few, if any, IOI values within the lower half or lower one-third ofthe possible range of IOI values. Since not all possible combinations ofthe n bits are used, it may be possible to compress each code 400 usinga hash function 410 to produce a compressed code 420. In this context, a“hash function” is any mathematical manipulation that compresses abinary string into a shorter binary string. Since the compressed codeswill be incorporated into a fingerprint used to identify, but notreproduce, a music sample, the hash function 410 need not be reversible.The hash function 410 may be applied to the binary string formed by thefrequency band identifier 402, the first IOI 404, and the second IOI 406to generate the compressed code 420. The timestamp 408 may be preservedand associated with the compressed code 420.

FIG. 5 is a graphical representation of an exemplary set of six codesthat may be associated with a specific onset. For purposes ofdiscussion, assume that the specific onset occurs at a time t0 andsubsequent onsets in the same frequency band occur at times t1, t2, t3,and t4. The identifiers t0-t4 refer both to the time when the onsetsoccurred and the timestamps assigned to the respective onsets. Sixcodes, identified as “Code A” through “Code F” may be generated for thespecific onset. Each code may have the format of the code 400 of FIG. 4.Each code may include a first IOI indicating the time interval from t0to a first subsequent onset and a second IOI indicating the timeinterval from the first subsequent onset to a second subsequent onset.The first subsequent onset and the second subsequent onset may beselected from all possible pairs of the four onsets following the onsetat t0. Each of the six codes (Code A-Code F) may also include afrequency band identifier (not shown) and may be associated withtimestamp t0.

Code A may contain the IOI from t0 to t1, and the IOI from t1 to t2.Code B may contain the IOI from t0 to t1, and the IOI from t1 to t3.Code C may contain the IOI from t0 to t1, and the IOI from t1 to t4.Code D may contain the IOI from t0 to t2, and the IOI from t2 to t3.Code E may contain the IOI from t0 to t2, and the IOI from t1 to t4.Code F may contain the IOI from t0 to t3, and the IOI from t3 to t4.

Referring now to FIG. 6, a process 600 for identifying a song based on afingerprint may begin at 610 when the fingerprint is provided. Thefingerprint may have been derived from an unknown music sample using,for example, the process 100 shown in FIG. 1. The process 600 may finishat 690 after a single song from a library of songs has been identified.

The fingerprint provided at 610 may contain a plurality of codes (whichmay be compressed or uncompressed) representing the unknown musicsample. Each code may be associated with a time stamp. At 620, a firstcode from the plurality of codes may be selected. At 630, the selectedcode may be used to access an inverted index for a music librarycontaining a large plurality of songs.

Referring now to FIG. 7, an inverted index 700 may be suitable for useat 630 in the process 600. The inverted index 700 may include arespective list, such as the list 710, for each possible code value. Thecode values used in the inverted index may be compressed oruncompressed, so long as the inverted index is consistent with the typeof codes within the fingerprint. Continuing the previous example, inwhich the music sample is represented by a plurality of 15-bit codes,the inverted index 700 may include 2¹⁵ lists of reference samples. Thelist associated with each code value may contain the reference sample ID720 of each reference sample in the music library that contains the codevalue. Each reference sample may be all or a portion of a track in themusic library. For example, each track in the music library may bedivided into overlapping 30-second reference samples. Each track in themusic library may be partitioned into reference samples in some othermanner.

The reference sample ID may be an index number or other identifier thatallows the track that contained the reference sample to be identified.The list associated with each code value may also contain an offset time730 indicating where the code value occurs within the identifiedreference sample. In situations where a reference sample containsmultiple segments having the same code value, multiple offset times maybe associated with the reference sample ID.

Referring back to FIG. 6, an inverted index, such as the inverted index700, may be populated at 635 by applying the process 100, as shown inFIG. 1, to reference samples drawn from some or all tracks in a librarycontaining a large plurality of tracks. In the situation where thelibrary contains multiple tracks of the same song, a representativetrack may be used to populate the inverted index. The process used at635 to generate fingerprints for the reference samples may notnecessarily be the same as the process used to generate the music samplefingerprint. The number and bandwidth of the filter bands and the targetonset rate used to generate fingerprints of the reference samples andthe music sample may be the same. However, since the fingerprints of thereference samples may be generated from an uncorrupted source, such as aCD track, the number of codes generated for each onset may be smallerfor the reference tracks than for the music sample.

At 640, a code match histogram may be developed. The code matchhistogram may be a list of all of the reference sample IDs for referencesamples that match at least one code from the fingerprint and a countvalue associated with each listed reference sample ID indicating howmany codes from the fingerprint matched that reference sample.

At 650, a determination may be made if more codes from the fingerprintshould be considered. When there are more codes to consider, the actionsfrom 620 to 650 may be repeated cyclically for each code. Specifically,at 630 each additional code may be used to access the inverted index. At640, the code match histogram may be updated to reflect the referencesamples that match the additional codes.

The actions from 620 to 650 may be repeated cyclically until all codescontained in the fingerprint have been processed. The actions from 620to 650 may be repeated until either all codes from the fingerprint havebeen processed or until a predetermined maximum number of codes havebeen processed. The actions from 620 to 650 may be repeated until allcodes from the fingerprint have been processed or until the histogrambuilt at 640 indicates a clear match between the music sample and one ofthe reference samples. The determination at 650 whether or not toprocess additional codes may be made in some other manner.

When a determination is made at 650 that no more codes should beprocessed, one or more best matches may be identified at 660. In thesimplest case, one reference sample may match all or nearly all of thecodes from the fingerprint, and no other reference sample may match morethan a small fraction of the codes. In this case, the unknown musicsample may be identified as a portion of the single track that containsthe reference sample that matched all or nearly all of the codes. In themore complex case, two or more candidate reference samples may match asignificant portion of the codes from the fingerprint, such that asingle reference sample matching the unknown music sample cannot beimmediately identified. The determination whether one or more referencesamples match the unknown music sample may be made based onpredetermined thresholds. The height of the highest peak in thehistogram may provide a confidence factor indicating a confidence levelin the match. The confidence factor may be derived from the absoluteheight or the number of matches of the highest peak. The confidencefactor may be derived from the relative height (number of matches in thehighest peak divided by a total number of matches in the histogram) ofthe highest peak. In some situations, for example when no referencesample matches more than a predetermined fraction of the codes from themusic sample, a determination may be made that no track in the musiclibrary matches the unknown music sample.

When only a single reference sample matches the music sample, theprocess 600 may end at 690. When two or more candidate reference samplesare determined to possibly match the music sample, the process 600 maycontinue at 670. At 670, a time-offset histogram may be created for eachcandidate reference sample. For each candidate reference sample, thedifference between the associated timestamp from the fingerprint and theoffset time from the inverted index may be determined for each matchingcode and a histogram may be created from the time-difference values.When the unknown music sample and a candidate reference sample actuallymatch, the histogram may have a pronounced peak. Note that the peak maynot be at time=0 because the start of the unknown music sample may notcoincide with the start of the reference sample. When a candidatereference sample does not, in fact, match the unknown music sample, thecorresponding time-difference histogram may not have a pronounced peak.At 680, the time-difference histogram having the highest peak value maybe determined, and the track containing the best-matching referencesample may be selected as the best match to the unknown music sample.The process 600 may then finish at 690.

Description of Apparatus

Referring now to FIG. 8, a system 800 for audio fingerprinting mayinclude a client computer 810, and a server 820 coupled via a network890. The network 890 may be or include the Internet. Although FIG. 8shows, for ease of explanation, a single client computer and a singleserver, it must be understood that a large plurality of client computersand be in communication with the server 820 concurrently, and that theserver 820 may comprise a plurality of servers, a server cluster, or avirtual server within a cloud.

Although shown as a portable computer, the client computer 810 may beany computing device including, but not limited to, a desktop personalcomputer, a portable computer, a laptop computer, a computing tablet, aset top box, a video game system, a personal music player, a telephone,or a personal digital assistant. Each of the client computer 810 and theserver 820 may be a computing device including at least one processor,memory, and a network interface. The server, in particular, may containa plurality of processors. Each of the client computer 810 and theserver 820 may include or be coupled to one or more storage devices. Theclient computer 810 may also include or be coupled to a display deviceand user input devices, such as a keyboard and mouse, not shown in FIG.8.

Each of the client computer 810 and the server 820 may execute softwareinstructions to perform the actions and methods described herein. Thesoftware instructions may be stored on a machine readable storage mediumwithin a storage device. Machine readable storage media include, forexample, magnetic media such as hard disks, floppy disks and tape;optical media such as compact disks (CD-ROM and CD-RW) and digitalversatile disks (DVD and DVD±RW); flash memory cards; and other storagemedia. Within this patent, the term “storage medium” refers to aphysical object capable of storing data. The term “storage medium” doesnot encompass transitory media, such as propagating signals orwaveforms.

Each of the client computer 810 and the server 820 may run an operatingsystem, including, for example, variations of the Linux, MicrosoftWindows, Symbian, and Apple Mac operating systems. To access theInternet, the client computer may run a browser such as MicrosoftExplorer or Mozilla Firefox, and an e-mail program such as MicrosoftOutlook or Lotus Notes. Each of the client computer 810 and the server820 may run one or more application programs to perform the actions andmethods described herein.

The client computer 810 may be used by a “requestor” to send a query tothe server 820 via the network 890. The query may request the server toidentify an unknown music sample. The client computer 810 may generate afingerprint of the unknown music sample and provide the fingerprint tothe server 820 via the network 890. In this case, the process 100 ofFIG. 1 may be performed by the client computer 810, and the process 600of FIG. 6 may be performed by the server 820. Alternatively, the clientcomputer may provide the music sample to the server as a series oftime-domain samples, in which case the process 100 of FIG. 1 and theprocess 600 of FIG. 6 may be performed by the server 820.

FIG. 9 is a block diagram of a computing device 900 which may besuitable for use as the client computer 810 and/or the server 820 ofFIG. 8. The computing device 900 may include a processor 910 coupled tomemory 920 and a storage device 930. The processor 910 may include oneor more microprocessor chips and supporting circuit devices. The storagedevice 930 may include a machine readable storage medium as previouslydescribed. The machine readable storage medium may store instructionsthat, when executed by the processor 910, cause the computing device 900to perform some or all of the processes described herein.

The processor 910 may be coupled to a network 960, which may be orinclude the Internet, via a communications link 970. The processor 910may be coupled to peripheral devices such as a display 940, a keyboard950, and other devices that are not shown.

Closing Comments

Throughout this description, the embodiments and examples shown shouldbe considered as exemplars, rather than limitations on the apparatus andprocedures disclosed or claimed. Although many of the examples presentedherein involve specific combinations of method acts or system elements,it should be understood that those acts and those elements may becombined in other ways to accomplish the same objectives. With regard toflowcharts, additional and fewer steps may be taken, and the steps asshown may be combined or further refined to achieve the methodsdescribed herein. Acts, elements and features discussed only inconnection with one embodiment are not intended to be excluded from asimilar role in other embodiments.

As used herein, “plurality” means two or more. As used herein, a “set”of items may include one or more of such items. As used herein, whetherin the written description or the claims, the terms “comprising”,“including”, “carrying”, “having”, “containing”, “involving”, and thelike are to be understood to be open-ended, i.e., to mean including butnot limited to. Only the transitional phrases “consisting of” and“consisting essentially of”, respectively, are closed or semi-closedtransitional phrases with respect to claims. Use of ordinal terms suchas “first”, “second”, “third”, etc., in the claims to modify a claimelement does not by itself connote any priority, precedence, or order ofone claim element over another or the temporal order in which acts of amethod are performed, but are used merely as labels to distinguish oneclaim element having a certain name from another element having a samename (but for use of the ordinal term) to distinguish the claimelements. As used herein, “and/or” means that the listed items arealternatives, but the alternatives also include any combination of thelisted items.

It is claimed:
 1. A method for generating a fingerprint of a musicsample, comprising: filtering the music sample into a plurality offrequency bands independently detecting onsets in each of the frequencybands determining inter-onset intervals between pairs of onsets withinthe same frequency band generating at least one code associated witheach onset, each code comprising a frequency band identifier identifyinga frequency band in which the associated onset occurred and one or moreinter-onset intervals associating each code with a timestamp indicatingwhen the associated onset occurred within the music sample combining allgenerated codes and the associated timestamps to form the fingerprint.2. The method of claim 1, further comprising: whitening the music sampleprior to filtering the music sample.
 3. The method of claim 1, whereindetecting onsets comprises, for each frequency band: comparing amagnitude of the music sample to an adaptive threshold.
 4. The method ofclaim 1, wherein generating at least one code associated with each onsetfurther comprises: generating a first code containing an inter-onsetinterval indicating a time interval from an associated onset to a firstsubsequent onset generating a second code containing an inter-onsetinterval indicating a time interval from the associated onset to asecond subsequent onset different from the first subsequent onset. 5.The method of claim 1, wherein generating at least one code associatedwith each onset further comprises: generating a code containing a firstinter-onset interval indicating a time interval from an associated onsetto a first subsequent onset and a second inter-onset interval indicatinga time interval from the associated onset to a second subsequent onsetdifferent from the first subsequent onset.
 6. The method of claim 1,wherein generating at least one code associated with each onset furthercomprises: generating a code containing a first inter-onset intervalindicating a time interval from an associated onset to a firstsubsequent onset and a second inter-onset interval indicating a timeinterval from the first subsequent onset to a second subsequent onsetdifferent from the first subsequent onset.
 7. The method of claim 6,wherein generating at least one code associated with each onset furthercomprises: generating six different codes, wherein the first subsequentonset and the second subsequent onset within the six codes are selectedas all possible pairs of onsets from the four onsets immediatelyfollowing the associated onset.
 8. A computing device for generating afingerprint of a music sample, comprising: a processor memory coupled tothe processor a storage device coupled to the processor, the storagedevice storing instructions that, when executed by the processor, causethe computing device to perform actions including: filtering the musicsample into a plurality of frequency bands independently detectingonsets in each of the frequency bands determining inter-onset intervalsbetween pairs of onsets within the same frequency band generating atleast one code associated with each onset, each code comprising afrequency band identifier identifying a frequency band in which theassociated onset occurred and one or more inter-onset intervalsassociating each code with a timestamp indicating when the associatedonset occurred within the music sample combining all generated codes andthe associated timestamps to form the fingerprint.
 9. The computingdevice of claim 8, the actions performed further comprising: whiteningthe music sample prior to filtering the music sample.
 10. The computingdevice of claim 8, wherein detecting onsets comprises, for eachfrequency band: comparing a magnitude of the music sample to an adaptivethreshold.
 11. The computing device of claim 8, wherein generating atleast one code associated with each onset further comprises: generatinga first code containing an inter-onset interval indicating a timeinterval from an associated onset to a first subsequent onset generatinga second code containing an inter-onset interval indicating a timeinterval from the associated onset to a second subsequent onsetdifferent from the first subsequent onset.
 12. The computing device ofclaim 8, wherein generating at least one code associated with each onsetfurther comprises: generating a code containing a first inter-onsetinterval indicating a time interval from an associated onset to a firstsubsequent onset and a second inter-onset interval indicating a timeinterval from the associated onset to a second subsequent onsetdifferent from the first subsequent onset.
 13. The computing device ofclaim 8, wherein generating at least one code associated with each onsetfurther comprises: generating a code containing a first inter-onsetinterval indicating a time interval from an associated onset to a firstsubsequent onset and a second inter-onset interval indicating a timeinterval from the first subsequent onset to a second subsequent onsetdifferent from the first subsequent onset.
 14. The computing device ofclaim 13, wherein generating at least one code associated with eachonset further comprises: generating six different codes, wherein thefirst subsequent onset and the second subsequent onset within the sixcodes are selected as all possible pairs of onsets from the four onsetsimmediately following the associated onset.
 15. A machine readablestorage medium storing instructions that, when executed by a computingdevice, cause the computing device to perform a process for generating afingerprint of a music sample, the process comprising: filtering themusic sample into a plurality of frequency bands independently detectingonsets in each of the frequency bands determining inter-onset intervalsbetween pairs of onsets within the same frequency band generating atleast one code associated with each onset, each code comprising afrequency band identifier identifying a frequency band in which theassociated onset occurred and one or more inter-onset intervalsassociating each code with a timestamp indicating when the associatedonset occurred within the music sample combining all generated codes andthe associated timestamps to form the fingerprint.
 16. The machinereadable storage medium of claim 15, the process further comprising:whitening the music sample prior to filtering the music sample.
 17. Themachine readable storage medium of claim 15, wherein detecting onsetscomprises, for each frequency band: comparing a magnitude of the musicsample to an adaptive threshold.
 18. The machine readable storage mediumof claim 15, wherein generating at least one code associated with eachonset further comprises: generating a first code containing aninter-onset interval indicating a time interval from an associated onsetto a first subsequent onset generating a second code containing aninter-onset interval indicating a time interval from the associatedonset to a second subsequent onset different from the first subsequentonset.
 19. The machine readable storage medium of claim 15, whereingenerating at least one code associated with each onset furthercomprises: generating a code containing a first inter-onset intervalindicating a time interval from an associated onset to a firstsubsequent onset and a second inter-onset interval indicating a timeinterval from the associated onset to a second subsequent onsetdifferent from the first subsequent onset.
 20. The machine readablestorage medium of claim 15, wherein generating at least one codeassociated with each onset further comprises: generating a codecontaining a first inter-onset interval indicating a time interval froman associated onset to a first subsequent onset and a second inter-onsetinterval indicating a time interval from the first subsequent onset to asecond subsequent onset different from the first subsequent onset. 21.The machine readable storage medium of claim 20, wherein generating atleast one code associated with each onset further comprises: generatingsix different codes, wherein the first subsequent onset and the secondsubsequent onset within the six codes are selected as all possible pairsof onsets from the four onsets immediately following the associatedonset.